Semi-Automated Elicitation Corpus Generation

نویسندگان

  • Alison Alvarez
  • Lori Levin
  • Robert Frederking
  • Erik Peterson
چکیده

In this document we will describe a semiautomated process for creating elicitation corpora. An elicitation corpus is translated by a bilingual consultant in order to produce high quality word aligned sentence pairs. The corpus sentences are automatically generated from detailed feature structures using the GenKit generation program. Feature structures themselves are automatically generated from information that is provided by a linguist using our corpus specification software. This helps us to build small, flexible corpora for testing and development of machine translation systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Challenges in Automated Elicitation of a Controlled Bilingual Corpus

In this paper we will address an uncommon but important approach to automated learning for MT, namely learning of translation rules from carefully elicited sentences. The approach is uncommon for good reason — anyone who has tried linguistic field work knows that elicitation will go awry if not carefully monitored by a human. We will address eight challenges of automated elicitation and discuss...

متن کامل

Making Requirements Speciications Accessible via Logic, Language and Graphics: a Progress Report

Natural language software tools may have an important role in making requirements spec-iications more accessible. Possible tools include text processors to support requirements elicitation, and text generators to support requirements validation. The current paper reports on our progress in developing a natural language generation system, integrating this tool with a graphical interface and an a...

متن کامل

Named entity recognition for automated test case generation

Testing is the process of evaluating a software or hardware against its requirement specification. It helps to verify and grade a given system. Recent emphasis on Test Driven Development (TDD) has increased the need for testing from the early stages of software development. System test cases can be obtained from a number of user specifications such as functional requirements; UML diagrams and u...

متن کامل

Automated Methods for Estimating Baseflow from Streamflow Records in a Semi Arid Watershed

Understanding of the runoff generation processes is important in understanding the magnitude and dynamics ofgroundwater discharge. However, these processes continue to be difficult to quantify and conceptualize. In this study,two digital filter based separation modules, the Recursive filtering method (RDF) and a generalization of therecursive digital filter (GRDF) were1991–2002 in the Hableh Ro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005